Vancouver Trees Dataset Analysis and Visualization Report

Oct 22, 2022
Notebook by Renbo Xu

Introduction

My nationality is China. I came to Canada to study several years ago and what impressed me the most is its beautiful natural environment. With its scenic views, mild climate, and friendly people, Vancouver is well-known around the world as both a popular tourist attraction and one of the best places to live (https://vancouver.ca/news-calendar/our-city.aspx). Its diverse natual environment,such as mountains, oceans and diverse wild animal, has attarcted a lot of immigrants to live here. One of the most important features, vancouver trees, also contributes a lot to the amazing beauty of Vancouver city. I am curious about how trees are distributed in different vancouver neighbourhood, what kind of genus they are, when they were planted, and so on. To answer these questions, a exploratory data analysis and visualization on the vancouver tree dataset would be necessory.

In this report, I will explore the vancouver tree distribution by analyzing a subset of Vancouver Street Trees dataset (https://opendata.vancouver.ca/explore/dataset/street-trees/information/?disjunctive.species_name&disjunctive.common_name&disjunctive.height_range_id&disjunctive.on_street&disjunctive.neighbourhood_name). The data were obtained from The city of Vancouver's Open Data Portal and follows an Open Government Licence – Vancouver (https://opendata.vancouver.ca/pages/licence/). The data analysis and visualization of this dataset will give us more details of vancouver trees (subset is from https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv).

Question(s) of interests

By exploring this dataset, I am interested in answering the following questions:

For the final dashboard, I would like to present the tree details of specific/selected vancouver neighbourhood, such as the top 10 genus, tree diameter and height distribution, planted year, and street side name.

Analysis

Data imports

Data Description and analysis

First, let's take a look at the general information of this dataset and all the columns.

According to the information above, we are able to see that there are 5000 entries in total and 21 columns. The first column has no name. It is actually the index number of the original dataset (we are only analyzing the subset data). Therefore, this column will be dropped for further analysis.

Next, let us take a look at the details of the each colum's information from website https://opendata.vancouver.ca/explore/dataset/street-trees/information/?disjunctive.species_name&disjunctive.common_name&disjunctive.height_range_id&disjunctive.on_street&disjunctive.neighbourhood_name. The columns details are summarized in Table 1.

Table 1 Vancouver dataset column details

Column name Datatype Details
std_street object Street name of the site at which the tree is associated with
on_street object The name of the street at which the tree is physically located on
species_name objec Species name
neighbourhood_name object City's defined local area in which the tree is located.
date_planted datetime The date of planting in YYYYMMDD format.
diameter float DBH in inches (DBH stands for diameter of tree at breast height)
street_side_name object The street side which the tree is physically located on (Even, Odd or Median (Med))
genus_name object Genus name
assigned object Indicates whether the address is made up to associate the tree with a nearby lot (Y=Yes or N=No)
civic_number int Street address of the site at which the tree is associated with
plant_area object B = behind sidewalk, G = in tree grate, N = no sidewalk, C = cutout, a number indicates boulevard width in feet
curb object Curb presence (Y = Yes, N = No)
tree_id int Numerical ID
common_name object Common name
height_range_id int 0-10 for every 10 feet (e.g., 0 = 0-10 ft, 1 = 10-20 ft, 2 = 20-30 ft, and10 = 100+ ft)
on_street_block int The street block at which the tree is physically located on
cultivar_name object Cultivar name
root_barrier object Root barrier installed (Y = Yes, N = No)
latitude float Location latitude
longitude float Location longitude

This dataset has 7 numerical columns, including diameter, civic_number, tree_id, height_range_id, on_street_block, latitude and longitude. The rest of columns are categorical except date_planted is temporal.

In this dataset, majority of the columns have 5000 entries, while date_planted, plant_area, cultivar_name has less entries, whose entries are 2363, 4950, 2658, respectively. Since date_planted and cultivar_name has only half of the entries of the total entries, I will keep the date_planted because it is one of the variables of my interest, but I will eliminate cultivar_name. Also, I will drop the NaN values of plant_area for further analysis. Last, I will drop some columns of no interest, including std_street, on_street, assigned, civic_number.

Now the dataset is ready for visualization.

Exploratory Visualizations

Question 1: What is the number of trees planted in different year and month?

Figure 1 Vancouver trees planted in different year and month

From Figure 1, it seems that from year 1989 to 1996, the number of trees planted is steadily increasing. Later from 1996 to 2013, the planted tree numbers fluctuates slightly and then starts dropping significantly from year 2013 to 2016. After that, the planted tree number starts slightly increasing again. For the number of planted trees for different month, Februry is the month with the most planted trees, while July and August have the minimum planted trees.

Question 2: Is there a relationship between tree diameter and height?

Usually, the tree with larger diameter is higher in height. Is this true for the vancouver tree dataset? Let's take a look at the relationship between the diameter and height_range_id.

Figure 2 Relationship of diameter and height of Vancouver trees

From the scatterplot, we are able to see that the tree height and tree diameter has positive relationship, indicating that tree with larger diameter usually has higher height range id. However, this is a just overall trend, not apply to every single point. Therefore, I would like to do a boxplot to reveal more statistics and also add a line chart of mean diameter value to the scatterplot.

Figure 3 Scatterplot and boxplot of diameter and height of Vancouver tree

From Figure 3, it seems that the overall trend of relatishop between tree diameter and height range id (reflected by the mean and median values) is positive, except there is a slight decrease in diameter from tree height range id 8 to 9 (this could due to less datapoints).

Only knowing the relationship between the tree diameter and height is not enough for this report. I am more interested in the tree details of different neighbourhood. Therefore, I would like to present the tree diameter and height range id in rugplot. Add widget to select different neibourhood and genus, to see their tree diameter and height. Also, I am curious about that for specific/selected neighbourhood and genus, what is the number of trees of different street_side_name and when they were planted (tree planted year). Therefore, I created the following plot (Figure 4) to answer these questions and add it to the final dashboard.

Figure 4 Tree details for different neighbourhood and genus

By selecting different neighbourhood and genus, we are able to see for specific genus in specific neighbourhood, what is its diameter and height range id, what is its street side number and how many of it has been planted over the year.

Question 3: What is the number of trees for different neighborhood and tree genus

I would like to find out the number of trees of different neighborhood and tree genus by making the following barplots.

Figure 5 Number of trees of different neighbourhood and genus

The top 5 neighbourhood with most tree number are Kensington_Cedar Cottage, Renfrew_Collingwood, Hastings_Sunrise, Dunbar_Southlands, Sunset. The top 5 tree genus planted are ACER, PRUNUS, TILIA, FRAXINUS, QUERCUS. How these top 5 genus trees distributed in different neighbourhood? Let us analyze this in the next question.

Question 4: What is the top 5 tree genus distribution in different neighbourhood

Figure 6 Number of trees of top 5 genus in different neighbourhood

From Figure 6, we are able to see that among the 5 genus, ACER and PRUNUS are the top 2 genus planted for almost all the neighbourhoods (except Downtown). The number of trees of these 5 genus is different for different neighbourhood.

Even though Figure 6 can give us a lot of information of top 5 tree gunus of different neighbourhood but it is not flexible. What if I want to know the top 10 genus? What if I am only interested in a specific/selected neighbourhood, such as West End? In this case, there is no need to show data for other neighbourhoods. Therefore, I have imporved the plot by applying selection features as follows.

Figure 7 Number of trees of top 10 genus for different neighbourhood

From Figure 7, we are able to find the top 10 genus (in descending order) of one specific/selected neighbourhood by clicking on the barchart. Also, I have added interactive feature to get the number of trees of different genus. This is a more convenient and efficient way to get the required information. We will include the Figure 7 in the final dashboard.

Discussion

The main purpose of this dataset analysis and visualization is to explore detailed information about vancouver trees. From the previous analysis, there are some interesting points I have found to answer the questions mentioned at beginning of this report.

First, it seems that the number of trees planted in different year and month are quite different (Figure 1). The number of trees planted steadily increases from year 1989 to 1996. Later from 1996 to 2013, the planted tree number fluctuates slightly and then starts dropping significantly from year 2013 to 2016. After that, the planted tree number starts slightly increasing again. Most trees were planted in Februry, while least were planted in July and August. It is recommended to plant trees in the rain season to ensure the survival of saplings especially in the first few months after they are planted (https://essc.org.ph/content/view/132/). This is maybe the reason we see the number of trees starts increasing from October, then peaks in February, and slows down from May, which follows the vancouver average precipitation trend (https://weather-and-climate.com/average-monthly-precipitation-Rainfall,vancouver,Canada).

From common sense, trees with larger diameters usually has higher height (positive correlation). This is supported by our scatterplot and boxplot (Figure 3). However, we do observe a slight decrease in diameter from tree height range id 8 to 9. This could be due to less datapoints. Another reason could be the way to present the tree height (based on range id, not actual height). To improve this or get more confirmed result, collecting more data and present data of real tree height (measured in inch) would be helpful.

From Figure 5 and 6, we are able to see that different vancouver neightbourhood has different number of trees and different genus. The top 5 neighbourhood with most tree number are Kensington_Cedar Cottage, Renfrew_Collingwood, Hastings_Sunrise, Dunbar_Southlands, Sunset. The top 5 tree genus planted are ACER, PRUNUS, TILIA, FRAXINUS, QUERCUS. By clicking the interactive plot (Figure 7), we are able to look into more details of top ten genus of different neighbourhood.

For the final interactive dashboard, the goal is to build a tool for goveronment or people who care about understanding their community/neighbourhood tree distribution/details. By using this dashboard, more detailed information about trees, such as genus, diameter & height, tree planted year and street side name, will be presented in an more convenient and efficient way.

This data visualization helps me answer all my questions and the results meet my expectation. Other information I would like to explore is to dig deeper into the tree species and common name of different neighouborhood. In addition, the current dataset just includes the tree numbers, not population of neighbourhood. Neighbourhood with more population could plant more trees. If the dataset could have included the population of the neighbourhood, it would be helpful to understand the number of trees/person, which will give us a better idea about how the community has done regarding the tree planting.

Interactive Dashboard

Now, I am ready to make the final dashboard. For the dashboard, I would like to combine Figure 4 and Figure 7 together. So it shows all the information about the tree details for different vancouver neighbourhood. The dashboard is coded as follows and show in Figure 9.

Figure 9 Dashboard of Vancouver Tree Dataset analysis and visualization

Reference